Word intuition agreement among Chinese speakers: a Mechanical Turk-based study

نویسندگان

  • Shichang Wang
  • Chu-Ren Huang
  • Yao Yao
  • Angel Chan
چکیده

Word intuition is speakers’ intuitive knowledge on wordhood. Collective word intuition is the word intuition of the whole language community. Given this definition, the optimal word segmentation result in Chinese NLP should reflect collective word intuition. It is also believed that an ideal definition of Chinese word should accord with the collective word intuition of Chinese speakers. To test the validity and feasibility of modeling collective word intuition, it is important to know to what extent Chinese speakers agree with each other on what is a word. In this study, we measured word intuition agreement using Mechanical Turk-based Chinese word segmentation experiment. Three metrics were used: proportionate agreement, Cohen’s kappa, and Fleiss’ kappa. The results show that Chinese speakers agree with each other almost perfectly on what is a word. And we found no evidence to support an effect of semantic transparency on word intuition agreement. Such high word intuition agreement among Chinese speakers supports the psychological reality of Chinese word and also suggests that that it is quite feasible to formulate a definition of Chinese word by modeling the collective word intuition of Chinese speakers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sifu: Interactive Crowd-Assisted Language Learning

This paper introduces SIFU, a system that recruits in real time native speakers as online volunteer tutors to help answer questions from Chinese language learners in reading news articles. SIFU integrates the strengths of two effective online language learning methods: reading online news and communicating with online native speakers. SIFU recruits volunteers from an online social network rathe...

متن کامل

Bilingualism, Biliteracy and Metalinguistic Awareness: Word Awareness in English and Japanese Users of Chinese as a Second Language

Cross-linguistic research shows that some aspects of metalinguistic awareness are affected by characteristics of different writing systems. Users of writing systems that mark word boundaries (such as English) develop word awareness, while users of unspaced writing systems (such as Chinese) do not. Previous research showed that English-speaking users of Chinese as a Second Language (CSL) have hi...

متن کامل

Clustering dictionary definitions using Amazon Mechanical Turk

Vocabulary tutors need word sense disambiguation (WSD) in order to provide exercises and assessments that match the sense of words being taught. Using expert annotators to build a WSD training set for all the words supported would be too expensive. Crowdsourcing that task seems to be a good solution. However, a first required step is to define what the possible sense labels to assign to word oc...

متن کامل

Exploring Mental Lexicon in an Efficient and Economic Way: Crowdsourcing Method for Linguistic Experiments

Mental lexicon plays a central role in human language competence and inspires the creation of new lexical resources. The traditional linguistic experiment methodwhich is used to exploremental lexicon has some disadvantages. Crowdsourcing has become a promising method to conduct linguistic experiments which enables us to explore mental lexicon in an efficient and economic way. We focus on the fe...

متن کامل

Influence of suprasegmental features on perceived ethnicity of American politicians

How accurate are listeners at identifying the ethnicities of political figures from one-word samples? Do suprasegmental variables provide a basis for these judgments? Tokens of six lexical items were extracted from speeches by seven male political figures of different stated ethnic identities. In a Mechanical Turk experiment, 94 listeners heard each token twice, then responded to the multiplech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017